The Climatological database for the world’s oceans (CLIWOC) is a project funded by the European Union to study ocean climatology by scanning and analysing ship’s logbooks from between 1662 and 1855. Comprising records from a dozen countries and hundreds of ships, the project resulted in the creation of a detailed dataset of more than 280,000 data points giving a rich insight into the rise of sea exploration and crucial information into the climate data of a pre-digitized world.


Each notebook entry has been converted into an observation, and for each of the 280,280 points, we have access to a number of information, from the names of the ships, the captain and first lieutenant, to the observed air and sea temperature, the wind speed and direction, as well as the geographical coordinates.
Using those it is possible to represent each of those entries on a world map, and obtain an accurate picture of sea travel and exploration during this period.

The chronology of the records reflect the rise and fall of the successive European empires and their reigns on the seas.

Counting the records however, might result in biased data, as it is doubtful that every ships recorded data with the same frequency.

Having access to the ship’s names, we can instead count those to estimate a sample of each country’s fleets at each point in time. Although it is still not guaranteed that the journals from each country are sampled with the same probability for each year, nor on the entire timeline.

There the unending supremacy of the British Empire’s fleet appear clearly. We can see the rise and fall of the Spanish Armada, and the end of the French fleet after the crushing defeat at Trafalgar, in 1805.



Since the dataset also provides the names of the captains of the ships, it is possible to follow some of the famous expeditions of the times. We can the voyage of James Cook around the globe, follow La Perouse until its disappearance in Oceania, as well as D’Auribeau’s expedition that went looking for him.


But, as its name implies, the original purpose of this dataset is to gain access to a previously unexplored period of meteorological data. Indeed, as some of the entries in the journals are accompanied by somewhat accurate readings of temperature, weather, wind speed and other meteorological events, it is possible to reconstruct parts of the weather of this period.

The data is however very sparse. Out of the 280,280 observations, only 56,000 contain a valid temperature reading. That’s an average of 560 readings per year, on the entire planet, which is not enough to obtain a really accurate climate model. Nonetheless, we can try to extract some trends for the entire period, and see how temperature extremas might have evolved.

If we restrict ourselves to the Atlantic ocean, and analyse larger periods of ten years, some trends might start to appear.

Additionally, we can use the spatial nature of the data to get a distribution of the extreme temperatures in the entire atlantic ocean. We separate the data in 3 distinct periods: 1775-1800, 1800-1825 and 1825-1855, then round the geographical coordinates to create a “grid” of squares of approximately 200,000 km2 then take the max and min temperatures in those squares. We can then fit a surface through all these points over the entire ocean to obtain a continuous distribution.

## 
## Attaching package: 'readr'
## 
## The following objects are masked from 'package:scales':
## 
##     col_factor, col_numeric
## 
## Loading required package: sp
## Loading required package: gstat
## Loading required package: shapefiles
## Loading required package: foreign
## 
## Attaching package: 'shapefiles'
## 
## The following objects are masked from 'package:foreign':
## 
##     read.dbf, write.dbf

We can also explore the evolution of the likelihood of extreme weather events during the years:

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

As well as their geographic distribution:

All in all, the sparsity of the dataset makes it hard to extract conclusive information on climate from this dataset. The data is likely highly biased towards certain routes and nationalities and different methods of recording entries. To really exploit it would probably require combining it with other sources of data from the same period. Nonetheless, this is probably the first time that it is possible to have access to so much data concerning a period before the advent of digital technology. Many more pieces of information can probably be taken from this data, from a climate science but also historical or sociological perspective.